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Abstract 

We  derive  a  CUR  approximate  matrix  factorization  based  on  the  Discrete  Empirical  Interpolation  Method 
(DEIM).  For  a  given  matrix  A,  such  a  factorization  provides  a  low  rank  approximate  decomposition  of  the  form 
A  «  CUR,  where  C  and  R  are  subsets  of  the  columns  and  rows  of  A,  and  U  is  constructed  to  make  CUR  a 
good  approximation.  Given  a  low-rank  singular  value  decomposition  A  «  VSWT,  the  DEIM  procedure  uses 
V  and  W  to  select  the  columns  and  rows  of  A  that  form  C  and  R.  Through  an  error  analysis  applicable  to  a 
general  class  of  CUR  factorizations,  we  show  that  the  accuracy  tracks  the  optimal  approximation  error  within 
a  factor  that  depends  on  the  conditioning  of  submatrices  of  V  and  W.  For  very  large  problems,  V  and  W 
can  be  approximated  well  using  an  incremental  QR  algorithm  that  makes  only  one  pass  through  A.  Numerical 
examples  illustrate  the  favorable  performance  of  the  DEIM-CUR  method  compared  to  CUR  approximations 
based  on  leverage  scores. 


1  Introduction 

This  work  presents  a  new  CUR  matrix  factorization  based  upon  the  Discrete  Empirical  Interpolation  Method 
(DEIM).  A  CUR  factorization  is  a  low  rank  approximation  of  a  matrix  A  e  Rmxn  of  the  form  A  ~  CUR,  where 
C  =  A(:,  q)  e  Rmxk  is  a  subset  of  the  columns  of  A  and  R  =  A(p, :)  e  Rkxn  is  a  subset  of  the  rows  of  A. 
(We  generally  assume  m  >  n  throughout.)  The  k  x  k  matrix  U  is  constructed  to  assure  that  CUR  is  a  good 
approximation  to  A.  Assuming  the  best  rank-A:  singular  value  decomposition  (SVD)  A  ~  VSW7  is  available, 
the  algorithm  uses  the  DEIM  index  selection  procedure,  q  =  DEIM(V)  and  p  =  DEIM(W),  to  determine  C 
and  R.  The  resulting  approximate  factorization  is  nearly  as  accurate  as  the  best  rank-A;  SVD,  with 

||  A  -  CUR||  <  (riP  +  rjq)  ak+i, 

where  ak+i  is  the  first  neglected  singular  value  of  A,  rjp  =  ||  V (p, :  )-1||,  and  rjq  =  ||W(q, :  )-1||. 

Here  and  throughout.  ||  •  ||  denotes  the  vector  2-norm  and  the  matrix  norm  it  induces,  and  \\-\\f  is  the  Frobenius 
norm.  We  use  MATLAB  notation  to  index  vectors  and  matrices,  so  that,  e.g.,  A(p, :)  denotes  the  k  rows  of  A 
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whose  indices  are  specified  by  the  entries  of  the  vector  p  G  Nfc,  while  A(:,  q)  denotes  the  k  columns  of  A  indexed 

by  q  G  Nfc. 

The  CUR  factorization  is  an  important  tool  for  handling  large-scale  data  sets,  offering  two  advantages  over  the 
SVD:  when  A  is  sparse,  so  too  are  C  and  R,  unlike  the  matrices  V  and  W  of  singular  vectors;  and  the  columns 
and  rows  that  comprise  C  and  R  are  representative  of  the  data  (e.g.,  sparse,  nonnegative,  integer  valued,  etc.). 
The  following  simple  example,  adapted  from  Mahoney  and  Drineas  [22,  Fig.  lb],  illustrates  the  latter  advantage. 
Construct  A  6  R2xb  so  that  its  first  n/2  columns  have  the  form 


Xl 

X2  _ 

and  the  remaining  n/2  columns  have  the  form 

'  -1  1  ' 

Xl 

2 

1 

1 

X2 

where  in  both  cases  x\  ~  N{ 0, 1)  and  x-)  ~  Ar(0. 42)  are  independent  samples  of  normal  random  variables,  i.e., 
the  columns  of  A  are  drawn  from  two  different  multivariate  normal  distributions.  Figure  1  shows  that  the  two  left 
singular  vectors,  though  orthogonal  by  construction,  fail  to  represent  the  true  nature  of  the  data;  in  contrast,  the 
first  two  columns  selected  by  the  DEIM-CUR  procedure  give  a  much  better  overall  representation.  While  trivial 
in  this  two-dimensional  case,  one  can  imagine  the  utility  of  such  approximations  for  high-dimensional  data.  We 
shall  illustrate  the  advantages  of  CUR  approximations  with  further  computational  examples  in  Section  6. 

CUR- type  factorizations  originated  with  “pseudoskeleton”  approximations  [14]  and  pivoted,  truncated  QR 
decompositions  [23];  in  recent  years  many  new  algorithms  have  been  proposed  in  the  numerical  linear  algebra  and 
theoretical  computer  science  literatures.  Some  approaches  seek  to  maximize  the  volume  of  the  decomposition  [14, 
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Figure  1 :  Comparison  of  singular  vectors  (left,  scaled,  in  red)  and  DEIM-CUR  columns  (right,  in  blue)  for  a  data 
set  drawn  from  two  multivariate  normal  distributions  having  different  principal  axes. 
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25].  Numerous  other  algorithms  instead  use  leverage  scores  [5,  10,  22,  28].  These  methods  typically  first  compute 
a  singular  value  decomposition1  A  =  VSW7’  (or  an  approximation  to  it),  with  V  £  R'rtXn,  W  <G  Rnxn. 
The  leverage  score  for  the  jth  row  (Ath  column)  of  A  is  the  squared  two-norm  of  the  jth  row  of  V  (Ath  row 
of  W).  When  scaled  by  the  number  of  singular  vectors,  these  leverage  scores  give  probability  distributions  for 
randomly  sampling  the  columns  and  rows  to  form  C  and  R.  This  approach  leads  to  probabilistic  bounds  on 
||  A  —  CURII^  [10,  22].  In  cases  where  A  has  small  singular  values  (precisely  the  case  where  one  would  seek 
a  low-rank  factorization),  the  singular  vectors  can  be  sensitive  to  perturbations  to  A,  making  the  leverages  scores 
unstable  [18].  Thus  leverage  scores  are  often  computed  using  only  the  leading  few  singular  vectors,  but  the  choice 
of  how  many  vectors  to  keep  can  be  somewhat  ad  hoc. 

The  algorithm  described  in  Sections  2  and  3  is  entirely  deterministic  and  involves  few  (if  any)  parameters.  The 
method  is  supported  by  an  error  analysis  in  Section  4  that  also  applies  to  a  broad  class  of  CUR  factorizations.  This 
section  includes  an  improved  bound  on  the  error  constants  r]p  and  rjq  for  DEIM  row  and  column  selection,  which 
also  applies  to  the  analysis  of  DEIM -based  model  order  reduction  [6],  In  Section  5  we  propose  a  novel  incre¬ 
mental  QR  algorithm  for  approximating  the  SVD  (and  potentially  also  approximating  leverage  scores).  Section  6 
illustrates  the  performance  of  this  new  CUR  factorization  on  several  examples. 

In  many  applications  one  cares  primarily  about  key  columns  or  rows  of  A,  rather  than  an  explicit  A  =  CUR 
factorization.  The  DEIM  technique,  which  identifies  rows  and  columns  of  A  independently,  can  easily  be  used 
to  select  only  columns  or  rows,  leading  to  an  “interpolatory  decomposition”  of  the  form  A  =  CU  or  A  =  UR; 
such  factorizations  have  the  advantage  that  U  can  be  much  better  conditioned  than  the  U  matrix  in  the  CUR 
factorization.  For  further  details  about  general  interpolatory  decompositions,  see  [7,  §1], 

2  CUR  Factorization 

We  arc  concerned  with  large  matrices  A  £  Mmxn  that  represent  nearly  low-rank  data,  which  can  therefore  be 
expressed  as 

A  =  CUR  +  F,  (2.1) 

with  ||F||  small  relative  to  ||A||.  The  matrix  C  £  Rmx/l  is  formed  by  extracting  A  columns  from  A.  and  R  £  Rkxn 
from  A  rows  of  A.  The  selected  row  and  column  indices  arc  stored  in  the  vectors  p,  q  £  Nfc,  so  that  C  =  A(:,  q) 
and  R  =  A(p, :).  Our  choice  for  p  and  q  is  guided  by  knowledge  of  the  rank-A  SVD  (or  an  approximation  to  it). 
Before  detailing  the  method  for  selecting  these  indices,  we  discuss  how,  given  p  and  q,  one  should  construct  U  so 
that  CUR  satisfies  desirable  approximation  properties. 

As  motivation,  suppose  for  the  moment  that  A  has  exact  rank  A,  and  C  and  R  arc  full-rank  subsets  of  the 
columns  and  rows  of  A.  Now  let  Y  £  Rmxfc  and  Z  £  Rnxfe  be  any  matrices  that  satisfy  Yv  C  =  RZ  =  I  £  Rkxk. 
Then  CY1  is  a  projector  onto  Ran(C)  =  Ran(A)  and  (ZR)T  is  a  projector  onto  Ran(RT)  =  Ran(A7), 
where  Ran(-)  denotes  the  range  (column  space).  It  follows  that  CY7  A  =  A  and  (ZR)7  A7  =  A7  .  Putting 
U  =  YrAZ  gives 

CUR  =  CYtAZR  =  AZR  =  A. 

Thus,  any  choice  of  Y  and  Z  that  satisfies  Y7  C  =  RZ  =  I  gives  a  U  such  that  CUR  exactly  recovers  A.  In 
general  different  choices  for  Y  and  Z  give  different  U  =  Y7  AZ. 

1  We  use  the  nonstandard  notation  VSWT  for  the  SVD  to  avoid  conflicts  with  U  in  the  standard  CUR  notation. 
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Now  consider  the  general  case  (2.1).  Once  p,  q,  Y,  and  Z  have  been  specified,  then 

U  =  YtAZ  and  F  =  A  -  CUR. 

One  might  design  Y  and  Z  so  that  CUR  matches  the  selected  columns  C  =  A(:,  q)  and  rows  R  =  A(p, :)  of 
A  exactly.  This  can  be  accomplished  with  interpolatory  projectors ,  which  we  discuss  in  detail  in  the  next  section. 
For  now,  let  P  =  I(:,  p)  £  Rmxfc  and  Q  =  I(:,  q)  £  Rnxk  be  submatrices  of  the  identity,  so  that  P7 a  =  a(p) 
and  b7  Q  =  b(q)?  for  arbitrary  vectors  a  and  b  of  appropriate  dimensions.  Now  define  Y7  =  (P7  C)_1Pr  and 
Z  =  Q(RQ)”1  (presuming  PTC  and  RQ  are  invertible).  Then  since  C  =  A(:,  q)  and  R  =  A(p, :), 

pi  C  =  C(p, :)  =  A(p,  q)  and  RQ  =  R(:,  q)  =  A(p,  q), 

so 

U  =  YTAZ  =  (PTC)-1PrAQ(RQ)-1  =  A(p,  q)’1  A(p,  q)  A(p,  q)’1  =  A(p,  q)"1. 

This  CUR  approximation  matches  the  q  columns  and  p  rows  of  A, 

A(:,  q)  =  CUR(:,  q)  and  A(p, :)  =  C(p,  :)UR, 

and,  in  our  experiments,  usually  delivers  a  very  good  approximation.  However,  a  CUR  factorization  with  better 
theoretical  approximation  properties  results  from  orthogonal  projection,  as  originally  suggested  by  Stewart  [23, 
p.  320];  see  also,  e.g.,  Mahoney  and  Drineas  [22].  Given  a  selection  of  indices  p  and  q,  again  put 

C  =  A(:,q)  and  R  =  A(p, :). 

Assume  that  C  and  R  both  have  full  rank  k,  and  now  let  Y7  =  C7  =  (C7  C  )  'C7  and  Z  =  R7  =  R7  (RRr)_1 
denote  left  and  right  inverses  of  C  and  R.  These  choices  also  satisfy  YrC  =  I  and  RZ  =  I,  but  now  CY7  = 
CC7  and  ZR  =  R7R  arc  orthogonal  projectors.  We  compute 

u  =  ytaz  =  c7ar7, 

yielding  a  CUR  factorization  that  can  be  viewed  as  a  two  step  process:  first  the  columns  of  A  are  projected  onto 
Ran(C),  then  the  result  is  projected  onto  the  row  space  of  R: 

1)  M  =  CC7A,  2)  CUR  =  MR7R. 

Both  steps  arc  optimal  with  respect  to  the  2-norm  error,  which  is  the  primary  source  of  the  excellent  approximation 
properties  of  this  approach. 

Several  strategies  for  selecting  p  and  q  have  been  proposed.2  The  approach  presented  in  the  next  section  is 
simple  to  implement  and  has  complexity  rnk  and  nk  to  select  the  indices  p  and  q,  provided  the  leading  k  right 
and  left  singular  vectors  of  A  are  available.  Thus  the  overall  complexity  is  dominated  by  the  construction  of  the 
rank-A:  SVD  A  «  VSW7  ,  where  V7  V  =  WTW  =  I  £  Rkxk  and  S  =  diag(oi,  02, ,  <7fc)  is  the  k  x  k  matrix 
of  dominant  singular  values  o\  >  a 2  >  •  •  •  >  cr*.. 

2In  the  theoretical  computer  science  literature,  one  often  takes  C  and/or  R  to  have  rank  larger  than  k,  but  then  builds  U  with  rank  k. 
By  selecting  these  extra  columns  and/or  rows,  one  seeks  to  get  within  some  factor  1  +  e  of  the  optimal  approximation;  see,  e.g.,  [5], 
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3  DEIM 


The  DEIM  point  selection  algorithm  was  first  presented  in  [6]  in  the  context  of  model  order  reduction  for  nonlinear 
dynamical  systems,  and  is  a  discrete  variant  of  the  Empirical  Interpolation  Method  originally  proposed  in  [4],  The 
DEIM  procedure  operates  on  the  singular  vector  matrices  V  and  W  independently  to  select  the  row  indices  p  and 
column  indices  q.  We  explain  the  process  for  selecting  p;  applying  the  same  steps  to  W  yields  q.  To  derive  the 
method,  we  elaborate  upon  the  interpolatory  projectors  introduced  in  the  last  section. 

Definition  3.1  Given  a  full  rank  matrix  V  £  Rmxfc  and  a  set  of  distinct  indices  p  £  Nk,  the  interpolatory  projector 
for  p  onto  Ran(V)  is 

V  =  V(PTV)_1PT,  (3.1) 

where  P  =  I( : ,  p)  £  M.mxk,  provided  P7  V  is  invertible. 

In  general  V  is  an  oblique  projector,  and  it  has  an  important  property  not  generally  enjoyed  by  orthogonal 
projectors:  for  any  x  £  W'n, 

(Vx)(p)  =  PTVx  =  PTV(PTV)_1PTx  =  PTx  =  x(p), 

so  the  projected  vector  Vx  matches  x  in  the  p  entries,  justifying  the  name  “interpolatory  projector.” 

The  DEIM  algorithm  processes  the  columns  of 

V  -  [  v,  v2  •  •  •  vfe  ] 

one  at  a  time,  starting  from  the  leading  singular  vector  vi.  Each  step  processes  the  next  singular  vector  to  produce 
the  next  index.  The  first  index  p\  corresponds  to  the  largest  magnitude  entry  in  vi: 

|vi(pt)|  =  ||  ViHoo- 


Now  define  pi  =  [pi],  and  let 


Vi  =  vitPfvO^Pf 


denote  the  interpolatory  projector  for  pi  onto  Ran(vi).  The  second  index  p2  corresponds  to  the  largest  entry  in 
v2,  after  the  interpolatory  projection  in  the  vi  direction  has  been  removed: 


r2  =  v2  —  Piv2 
*2{p2)\  =  II r2  ||oo- 


Notice  that  r2(pi)  =  0,  since  V\  v2  matches  v2  in  the  p\  position,  a  consequence  of  interpolatory  projection.  This 
property  ensures  the  process  will  never  produce  duplicate  indices. 

Now  suppose  we  have  j  —  1  indices,  with 


Pi-i 


Pi 


Pi-i  J 


i-i 


.Pi-iJ 


V.7  — 1 


=  Vi 


vi-l 


Vi- 1  =  V, 


,v. 


\-1tdT 


i-i' 
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Input:  V,  an  m  X  k  matrix  (m  >  k) 

Output:  p,  an  integer  vector  with  k  distinct  entries  in  {1, ,  rn } 

v  =  V(:,  1) 

[~,pi]  =  max(|v|) 

P  =  [pi] 

for  j  =  2,3,...,  k 
v  =  V  (:,j) 

c  =  V(p,  1  :j  -  l)_1v(p) 
r  =  v  —  V(:,  1  :  j  —  l)c 
[~,Pj]  =  max(|r|) 

p  =  [p;  Pj\ 


Algorithm  1 :  DEIM  point  selection  algorithm. 


To  select  pj,  remove  from  v.;  its  interpolatory  projection  onto  indices  Pj_i  and  take  the  largest  remaining  entry: 

**./  =  V/  ~  Vj-lVj 

\rj(pj)\  =  Ibilloo- 

Implementations  should  not  explicitly  construct  these  projectors;  see  the  pseudocode  in  Algorithm  1  for  details. 

Those  familial-  with  partially  pivoted  LU  decomposition  will  notice,  on  a  moment’s  reflection,  that  this  index 
selection  scheme  is  exactly  equivalent  to  the  index  selection  of  partial  pivoting.  This  arrangement  is  equivalent  to 
the  “left  looking”  variant  of  LU  factorization  [9,  sect.  5.4],  but  with  two  important  differences.  First,  there  are  no 
explicit  row  interchanges  in  DEIM,  as  there  are  in  LU  factorization.  Second,  the  original  basis  vectors  (columns 
of  V)  are  not  replaced  with  the  residual  vectors,  as  happens  in  traditional  LU  decomposition.  (In  the  context  of 
model  reduction,  it  is  preferable  to  keep  the  nice  orthogonal  basis  intact  for  use  as  a  reduced  basis.)  We  will  exploit 
this  connection  with  partially  pivoted  LU  factorization  to  analyze  the  approximation  properties  of  DEIM. 

Since  the  DEIM  algorithm  processes  the  singular  vectors  sequentially,  from  most  to  least  significant,  it  intro¬ 
duces  new  singular  vector  information  in  a  coherent  manner  as  it  successively  selects  the  k  indices.  Contrast  this 
to  index  selection  strategies  based  on  leverage  scores,  where  all  singular  vectors  are  incorporated  at  once  via  row 
norms  of  V  and  W;  to  account  for  the  fact  that  higher  singular  vectors  are  less  significant,  such  approaches  often 
instead  compute  leverage  scores  using  only  a  few  of  the  leading  singular  vectors.3 

For  the  interpolatory  projector  Vj  to  exist  at  the  jth  step,  Pj_1  Vj_i  must  be  nonsingular.  The  linear  indepen¬ 
dence  of  the  columns  of  V  assures  this.  In  the  following,  ej  denotes  the  jth  column  of  the  identity  matrix. 

3  A  potential  limitation  of  the  DEIM  approach  is  that  r  ,  could  have  multiple  entries  that  have  nearly  the  same  magnitude,  but  only  one 
index  is  selected  at  the  jth  step;  if  the  other  large-magnitude  entries  in  r:l  are  not  significant  in  subsequent  re  vectors,  the  corresponding 
indices  will  not  be  selected.  One  can  imagine  modifications  of  the  selection  algorithm  to  account  for  such  situations,  e.g.,  by  processing 
multiple  singular  vectors  at  a  time. 
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Lemma  3.1  LetVj  =  [epi,  eP2, . . . ,  ePj]  and  let  V;y  =  [vi,  V2, . . . ,  Vj]  for  1  <  j  <  k.  /frank  (V)  =  k,  then 
PfV  j  is  nonsingular  for  1  <  j <  k. 


Proof:  Suppose  Pj_1Vj_i  is  nonsingular  and  let  r j  =  Vj  —  Vj_i(Pj_1Vj_i)  1  P-_  |  vy.  Then  j r  j  1 1 oc  >  0,  for 
otherwise  0  =  vj  —  V/-|C?_i,  in  violation  of  the  assumption  that  rank  (V)  =  k.  Thus 

0  <  =  | e^.vj  -  eJVj_1(Pj_1V_,-_i)“1Pj_1v:;-|)  (3.2) 

where  pj  is  the  jth  DEIM  interpolation  point.  Now  factor 


Pi’Vi 


where 


T 

Co. 

_ i 

1,-1 

0  ' 

'  PJ-1V7-1  Pj-iV 

L  4 

VI  \ 

.  epj Vj-!  (P J_ i Vj_ ! ) ~ 1 

1 

0  ”3  - 

vi  =  ~  1‘Pj  ,V,  i)  :P|  ;  v j . 


(3.3) 


The  inequality  (3.2)  implies  Vj  f  0  and  hence  equation  (3.3)  implies  Pj  V;  is  nonsingular.  Since  ej^vi  f  0,  this 
argument  provides  an  inductive  proof  that  P j  Vj  is  nonsingular  for  1  <  J  <  k.  U 


4  CUR  Approximation  Properties 

While  the  theory  presented  in  this  section  was  designed  to  bound  ||  A  —  CUR|j  for  the  DEIM-CUR  method,  the 
analysis  applies  to  any  CUR  factorization  with  full  rank  C  <E  Rmxk  and  R  €  Rkxn,  and  U  =  C]AR;,  regardless 
of  the  procedure  used  for  selecting  the  columns  and  rows.4 

Consider  a  CUR  factorization  that  uses  row  indices  p  £  Nfc  and  column  indices  q  £  Nk,  and  set 

P  =  I( : ,  p)  =  [ePl , . . . ,  epJ  £  Rmxk,  Q  =  I( : ,  q)  =  [egi , . . . ,  e, J  £  Rnxk. 

The  first  step  in  this  analysis  bounds  the  mismatch  between  A  and  its  interpolatory  projection  PA. 

Lemma  4.1  Assume  P7  V  is  invertible  and  let  V  =  V (Pv  Vf  1 P7  be  the  interpolatory  projector  (3.1).  If 
V7  V  =  I,  then  any  A  £  Rmxn  satisfies 

|| A  —  PA||  <  ||(PtV)-1||||(I-VVt)A||. 

Additionally,  ifV  consists  of  the  leading  k  left  singular  vectors  of  A,  then 

II A  -  PA||  =  ||(I  -  P)A||  <  ||(PTV)-1||  ak+1. 

4We  are  grateful  to  Ilse  Ipsen  for  noting  the  applicability  of  this  analysis  to  all  such  CUR  factorizations,  and  for  also  pointing  out  that, 
given  knowledge  of  all  the  singular  values  and  vectors  of  A.  our  Lemma  4.2  can  be  sharpened  via  application  of  [16,  Thm.  9.1].  Indeed, 
Ipsen  observes  that  the  interpolatory  projector  proof  of  Lemma  4.2  can  be  adapted  to  simplify  the  multipage  proof  of  [16,  Thm.  9.1]. 
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Proof:  First  note  that  PV  =  V (Pv  V)  1 P7  V  =  V,  so  that  (I  —  V)  V  =  0.  Therefore 

;A  PA  I  =  ||(I  —  P)A||  =  ||(I -P)(I-  VVt)A||  <  ||(I  -  P)||||(I  -  VVr)A||. 

It  is  well  known  that 

iii-^ii  =  ni  =  ii(pTv)-ln 

so  long  as  P  f  0  or  I;  see,  e.g.,  [24].  This  establishes  the  first  result.  The  second  follows  from  the  fact  that 

||(I  -  VVr)A||  =  || A  -  VSWT||  =  ak+1 

when  V  consists  of  the  leading  k  left  singular  vectors  of  A.  H 

Now  let  VSWr  ps  A  be  a  rank- A;  SVD  of  A.  (The  singular  vectors  play  a  crucial  role  in  this  analysis,  even  if 
p  and  q  were  selected  using  some  scheme  that  did  not  reference  them.)  In  addition  to  the  interpolatory  projector 
P  =  V (P7  V)_1P7  that  operates  on  the  left  of  A,  we  shall  also  use  Q  =  Q(  W7  Q)_1  WT,  which  operates  on 
the  right  of  A.  Assuming  that  P7  V  and  WTQ  are  invertible,  define  the  error  constants 

Vp  =  II  (P^V)- 1 1| ,  T]q  =  ||(WtQ)-1||. 

Lemma  4. 1  implies 

||A(I  -  Q) ||  <  riqak+i  and  || (I  -  P)A||  <  riPak+i.  (4.1) 

The  next  lemma  shows  that  these  bounds  on  the  error  of  the  interpolatory  projection  of  A  onto  the  select  columns 
and  rows  also  apply  to  the  orthogonal  projections  of  A  onto  the  same  column  and  row  spaces. 

Lemma  4.2  Suppose  the  row  and  column  indices  p  and  q  give  full  rank  matrices  C  =  A( : ,  q)  =  AQ  £  Rm  x  k 
and  R  =  A(p,  : )  =  PA  £  Rkxn,  with  finite  error  constants  pp  and  pq,  and  suppose  that  k  <  min{m,  n}.  Then 

||(I  -  CC7)A||  <  r)qok+i  and  ||A(I  -  R/R)||  <  rjpak+1. 

Proof:  Using  the  formula  C  =  AQ,  we  have  C7  =  (C7  C)~  1 C7  =  (Q7  A7  AQ)_1(AQ)T,  so  the  orthogonal 
projection  of  A  onto  Ran(C)  is 

CC7A  =  (AQ(QtAtAQ)-1QtAt)A  =  A(Q(QtAtAQ)-1QtAtA). 

Hence  the  error  in  the  orthogonal  projection  of  A  is 

(I  -  CCJ)A  =  A(I  -  $),  where  $  =  Q(QTArAQ)-1QTATA.  (4.2) 

Note  that  $  is  an  oblique  projector  onto  Ran(Q),  so  4>Q  =  Q.  Therefore,  4 ?Q  =  Q,  since 

=  $Q(WtQ)'1Wt  =  Q(WtQ)-1Wt  =  Q. 


This  implies  that 


A(I  -  $)  =  A(I  -  *)(I  -  Q)  =  (I  -  CC7)A(I  -  Q), 


and  so  from  (4.2)  we  have 


||(I-CC7)A||  =  ||A(I-*)|| 

=  ||(I  —  CC7)A(I  —  Q)|| 

<  III  -  CCJ||||  A(I  -  (2)|| 

—  Vq&k+l- 

The  last  line  follows  from  the  bound  (4.1)  and  the  fact  that  ||I  —  CC1  [|  =  1,  since  CC7  is  an  orthogonal  projector 
and  k  <  min{m,  n}. 

A  similar  argument  shows  that 

A(I  R7R)  =  (I  -  A 

where  \P  =  AATP(PJ  AA7P)_1PT,  and  also  that 

(I  -  \P)A  =  (I  -  V)(I  -  'PjA  =  (I  -  V)A{I  -  R7R), 
from  which  follows  the  error  bound 

II A(I  -  R7R)||  <11(1-  millll  -  RJR||  <  Vpak+ 1.  ■ 

The  main  result  on  approximation  of  A  by  CUR  readily  follows  from  combining  this  last  lemma  with  a  basic 
CUR  analysis  technique  used  by  Mahoney  and  Drineas  [22,  eq.  (6)]. 

Theorem  4.1  Given  A  £  Mmxn  and  1  <  k  <  rnin{m,  n},  let  C  =  A( : ,  q)  G  Rmxfe  and  R  =  A(p,  : )  £  Rkxn 
with  finite  error  constants  i}p  and  rjq,  and  set  U  =  C]ARJ.  Then 

II A  —  CUR||  <  (r]p  +  riq)ak+i. 

Proof:  From  the  definitions, 

A  -  CUR  =  A  -  CC7AR7R  =  (I  -  CC7)A  +  CC7A(I  -  R7R). 

Applying  Lemma  4.2, 

|| A  —  CUR||  <  ||(I-  CC7)A||  +  ||CC7||||A(I-R7R)|| 

<  'Hq&k+l  T  Vp&k+L 

=  (Vp  “I"  rtq)(Jk+ 1) 

since  ||CC7||  =  1.  H 

Theorem  4.1  shows  that  CUR  is  within  a  factor  of  rjp  +  i/q  of  the  optimal  rank -A-  approximation,  hence 
these  error  constants  suggest  a  way  to  assess  a  wide  variety  of  column/row  selection  schemes.  The  quality  of  the 
approximation  is  controlled  by  the  conditioning  of  the  selected  k  rows  of  the  dominant  k  (exact)  singular  vectors. 
If  those  singular  vectors  arc  available  as  paid  of  the  column/row  selection  process,  then  Theorem  4.1  provides  an 
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a  posteriori  bound  requiring  only  the  fast  (0(k:>"j)  computation  of  t)p  and  rjq,  and  thus  could  suggest  methods  for 
adjusting  either  k  or  the  point  selection  process  to  reduce  the  error  constants.  In  this  context,  notice  that  if  VS  W7 
is  only  an  approximation  to  the  optimal  rank-/;;  SVD  with  V  and  W  having  orthonormal  columns  (as  computed, 
for  example,  using  the  incremental  QR  algorithm  described  in  the  next  section),  the  preceding  analysis  gives 

|| A  —  CUR|j  <  ||(I-  CC7)A||  +  || A(I  —  RJR)|| 

=  ||A(I-Q)||  +  ||(I-P)A|| 

<  ||(WTQ)-1||||A(I-  WWt)||  +  ||(PTV)-1||||(I-  VVt)A||,  (4.3) 

showing  how  ak+ i  in  Theorem  4. 1  is  replaced  by  the  error  in  the  approximate  SVD  through  ||  A(I  —  WW7  )  ||  and 
||  (I  —  VV7  )  A 1 1 .  In  this  case  ||  ( W7  Q)_1 1|  and  ||  (P7  V)-1 1|  arc  computed  using  the  approximate  singular  vectors 
in  V  and  W,  rather  than  the  exact  singular  vectors  in  the  theorem.  Alternatively,  if  one  has  probabilistic  bounds 
for  r]p  and  r]q,  then  Theorem  4.1  immediately  gives  a  probabilistic  bound  for  ||  A  —  CUR||. 

Numerical  examples  in  Section  6  compare  how  the  error  constants  evolve  as  k  increases  for  the  DEIM-CUR 
factorization  and  several  other  factorizations  based  on  leverage  scores. 

4.1  Interpretation  of  the  bound  for  DEIM-CUR 

For  DEIM-CUR,  we  can  ensure  the  hypotheses  of  Theorem  4. 1  arc  satisfied  and  bound  the  error  constants.  Suppose 
the  DEIM  points  are  selected  using  the  exact  rank-/,:  SVD  A  ~  VSW7 .  Lemma  3.1  ensures  that  the  matrices 
P7  V  and  W7  Q  arc  invertible,  so  rjp  and  r/q  arc  finite.  The  DEIM  strategy  also  gives  full  rank  C  and  R  matrices, 
presuming  k  <  rank(A).  To  see  this,  note  that  for  any  unit  vector  y  <E  Rk, 

Cy  =  AQy  =  VSWrQy  +  EQy, 

where  E  =  A  -  VSWT.  Since  VTE  =  0, 

||Cy||2  =  1 1  AQy  1 1 2  =  ||VSWrQy||2  +  ||EQy||2. 

Since  || WTQy||  >  ||y||/||(WTQ)-1||  =  1/Vq, 

II Cy ||  >  || VSW^Qy ||  >  ak/Vq  >  0. 

Thus  C  must  be  full  rank.  A  similar  argument  shows  R  to  be  full  rank  as  well. 

The  examples  in  Section  6  illustrate  that  pp  and  rjq  are  often  quite  modest  for  the  DEIM-CUR  approach,  e.g., 
0(100).  However,  worst-case  bounds  permit  significant  growth  in  k  that  is  generally  not  observed  in  practice.  We 
begin  by  stating  a  bound  on  this  growth  developed  by  Chaturantabut  and  Sorensen  [6,  Lemma  3.2]. 

Lemma  4.3  For  the  DEIM  selection  scheme  derived  above , 

(1  +  s/2m)k~1  + 

dp  —  n  n  >  Vq  —  m  m  ? 

llvilloo  ||wi||oo 

where  vi  and  wj  denote  the  first  columns  of  V  and  W. 
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Motivated  by  recent  work  by  Drmac  and  Gugercin  [11]  on  a  modified  DEIM-like  algorithm  for  model  reduc¬ 
tion,  we  can  improve  this  bound  considerably. 

Lemma  4.4  For  the  DEIM  selection  scheme  derived  above, 


Proof:  We  shall  prove  the  result  for  i}p\  the  result  for  rjq  follows  similarly.  As  usual,  let  V  G  Rmxk  have  or¬ 
thonormal  columns,  and  let  p  =  DEIM(V)  denote  the  row  index  vector  derived  from  the  DEIM  selection  scheme 
described  above.  Let  P  =  I( : ,  p)  so  that  P1  V  =  V (p,  : ). 

Without  loss  of  generality,  assume  the  DEIM  index  selection  gives  p  =  [1,2,...,  k]T.  (Otherwise,  introduce  a 
permutation  matrix  to  the  argument  that  follows.)  As  described  in  section  3,  the  DEIM  index  selection  is  precisely 
the  index  selection  of  LU  decomposition  with  partial  pivoting,  so  one  can  write 

V  =  LT, 

where  the  nonsingular  matrix  T  G  Rkxk  is  upper  triangular  and  L  G  R"tx/,  is  unit  lower  triangular  with  L(7,  j) \  < 
1,  L (j,j)  =  1,  1  <  j  <  k  and  L (i,j)  =  0,  j  >  i. 

Let  Li  =  L(1  :  k,  1  :  k ).  Then  V(p,  : )  =  LiT  and  thus 

Vp  =  ||(PTV)-1||  =  |!(L1T)-1||  <  ||T-1||||Lr1||. 

(The  lineal-  independence  of  the  columns  of  V  ensure  that  Li  and  T  ai'e  invertible.)  Upper  bounds  for  |  T 1 1|  and 
IjL]-1 1|  will  give  an  upper  bound  for  r)p. 

To  bound  ||T-1||,  let  y  G  Rk  be  a  unit  vector  such  that  ||T_1y||  =  ||T_1|j.  Then 

IIT"1 1|  =  ||T_1y||  =  HVT-Vll  =  ||  Ly  || . 

Now 

II Ly  ||  <  s/m  ||Ly  Hoc  =  s/m\ejl,y\, 

for  some  index  j  G  {1, . . . ,  m } .  By  the  Cauchy-Schwai'z  inequality  and  the  bound  |L(i,  j)\  <  1, 

|ej Ly|  <  ||LTej|| ||y||  <  Vk  ■  1, 

and  so  it  follows  that  || T  1||  <  s/rrik. 

The  inverse  of  Li  can  be  bounded  using  forward  substitution.  Let  Liz  =  y,  where  ||y||  =  1  and  ||z||  =  ||L]_1||. 
Forward  substitution  provides 

Ci  =  7i 

i— 1 

Ci  =  7*  —  ~y  ^  ^ijCj  i  i  =  2, . . . ,  k, 

3= 1 
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where  Q  =  z (i),  7,  =  y(i)  and  Xl3  =  L We  now  use  induction  to  prove 

101  <  2*_1,  1  <  i  <  k. 

First  note  that  £i  =  71 ,  so  G  I  <  I71 1  <  1  =  2°  to  establish  the  base  case.  Assume  for  some  i  >  1  that 

IGI<2^\  1  <  3  <  1. 


Then 


I  Ci+1  I  - 


7»+i 

3= 1 


< 


l7i+l|  +  E  I 'MIC?  I 

3= 1 


i— 1 


to  complete  the  induction.  Now  since  ||z||  =  ||LX  ||, 


<  1  +  E  1  '  2j~1  =  1  +  E  2J  =  1  +  (2*  -  1)  =  2* 

i=i  3=0 

-ii 


fc-i 


IL^1!!2  =  zTz  =  Y,  1012  <  E4'  =  (4fc  - 


i=l  i=0 


Thus  ||L1 1||  <  2fc/\/3,  which,  together  with  the  bound  on  the  inverse  of  T,  provides  the  final  result  for  m  >  k : 


,pS||(P3'V)-‘||<^2‘. 

If  rn  =  k,  then  //?)  =  1,  and  the  result  holds  trivially.  H 

Note  that  this  proof  only  relies  on  the  orthonormality  of  the  columns  of  V  and  W,  and  hence  it  applies  when 
the  DEIM  selection  scheme  is  applied  to  approximate  singular  vectors,  as  in  CUR  error  bound  in  (4.3). 

Lemma  4.4  was  inspired  by  the  proof  technique  developed  by  Drmac  and  Gugercin  [1 1]  to  bound  j  (P7  V)-1 1| , 
when  P  is  selected  by  applying  a  pivoted  rank-revealing  QR  factorization  scheme  to  V.  Note  that  this  new  bound 
is  on  the  same  order  of  magnitude  as  the  Drmac-Gugercin  scheme.  In  practice,  their  scheme  seems  to  give  slightly 
smaller  growth  that  is  more  consistent  over  a  wide  range  of  examples.  Neither  scheme  experienced  exponential 
growth  over  very  extensive  testing.  For  the  DEIM  approach,  this  absence  of  exponential  growth  is  closely  related 
to  decades  of  experience  with  Gaussian  elimination  with  partial  pivoting.  Element  growth  in  T  is  bounded  by  a 
factor  of  2/,:  1  (for  a  k  x  /;:  matrix),  and  there  is  an  example  that  achieves  this  growth.  Nevertheless,  this  algorithm 
is  almost  exclusively  used  to  solve  linear  systems  because  such  growth  is  never  experienced.5  Indeed,  a  similar 
near  worst  case  example  can  be  constructed  for  DEIM,  although  this  growth  has  not  been  observed  in  practice. 

5  See,  for  example,  the  extensive  numerical  tests  involving  random  matrices  described  in  [26,  lecture  22]  and  [27],  Interestingly,  in  the 
experiments  of  Trefethen  and  Schreiber  [27],  random  matrices  with  orthonormal  columns  tend  to  have  slightly  larger  growth  factors  than 
Gaussian  matrices,  though  both  cases  are  very  far  indeed  from  the  exponential  upper  bound. 
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A  Growth  Example:  We  now  construct  an  orthonormal  matrix  V  with  the  property 


^=2k<  Vp  =  ||(PTV)-1||  <  ^  2k  (4.4) 

where  P7  V  =  V (p,  : )  with  p  =  DEIM(V).  To  construct  V,  begin  by  defining 


Now  construct  VTi  =  L  as  an  economy-sized  QR  factorization  of  L  (with  no  column  pivoting).  Since  the  columns 
of  L  arc  linearly  independent  by  construction,  Ti  G  Rkxk  is  invertible;  define  T  =  Tx  1,  so  that  V  =  LT.  (Note 
that  T  plays  the  same  role  it  does  in  the  proof  of  Lemma  4.4.)  If  the  DEIM  procedure  is  applied  to  V,  then  by 
construction  p  =  [1,  2, . . . ,  k]  (in  exact  arithmetic):  during  the  DEIM  procedure,  the  relations 

li  Tu  =  v.i  -  v,  i  :,>j  ; v  /  t)  ;,>j  i  v.i  •  3  >  1 

hold,  with  lj  =  L( :  ,j),  tjj  =  T  (j,j),  Vj  =  V(:,j),  Py-i  =  I( : ,  1  :  j  ~  1)  and  Vj-!  =  V( : ,  1  :  j  -  1).  Thus 
p(j)  =  j,  j  >  1  and  it  is  easily  seen  that  p(l)  =  1. 

Note  that  VT_1  =  L  implies  T~7  T^1  =  L7  L  hence  |jT||  =  l/a^,  where  a ^  is  the  smallest  singular  value 
of  L.  Let  y  be  the  corresponding  right  singular  vector,  so  that 

4  =  yTL7’Ly. 

We  claim  that  ny.  >  y/2.  To  see  this,  write  L  in  the  form 


where 


forf  =  [1,...,1]T  G  Rm~k. 

LrL  =  Ifc  —  L0  —  +  Lq  L0  +  EtE 

=  Ifc  -  (eeT  -  Ife)  +  LqL0  +  {m  -  k)eeT 
=  2 1^.  +  Lq  Lo  +  (m  —  k  —  l)eeT 

=  2 Ifc  +  M, 
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where  M  :=  Lq  Lo  +  (m  —  k  —  l)eeT  is  symmetric  and  positive  semidefinite  whenever  m  >  k.  Thus 

o\  =  y1  L1  Ly  =  yT(2Ifc  +  M)y  >  2 

and  hence  it  follows  that 

1 1 T 1 1  <  l/y/2. 

This  implies 

||(pTv)"1||  =  HT^Lr1!!  >  >  V2  ULr1!!. 

To  complete  the  lower  bound,  we  must  analyze  IjL^1 1|.  Forward  substitution  gives  L^ei  =  [1, 1,  2, 4, ... ,  2k~2]T 
and  thus 

HL^H  >  ||L71ei||  =  \] ^  +  (4fc_1  —  l)/3  >  2k~2 . 

We  arrive  at  the  lower  bound 

rip  =  IKP^V)-1!!  >  V2  HL^H  >  V2  ■  2k~2, 

and  thus  for  this  choice  of  V,  the  DEIM  error  constant  satisfies 

1  /mfc 

7i2  <r?p< v^“  ■ 

This  example  is  interesting  because  it  relies  on  the  behavior  of  the  classic  example  for  growth  in  LU  decom¬ 
position  [26,  lecture  22].  However,  in  this  case  the  pathological  growth  is  caused  by  L  and  not  by  T. 

5  Incremental  QR  Factorization 

The  DEIM  point  selection  process  presumes  access  to  the  first  k  left  and  right  singular  vectors  of  A  G  Rmxn. 
If  either  m  or  n  is  of  modest  size  (say  <  1000)  and  A  can  be  stored  as  a  dense  matrix,  library  software  for 
computing  the  “economy  sized”  SVD,  e.g.,  [V,  S,W]  =  svd  (A,  'econ'  )  in  MATLAB,  usually  performs 
very  well.  For  larger  scale  problems,  the  leading  k  singular  vectors  can  be  computed  using  iterative  methods,  such 
as  the  Krylov  subspace-based  ARPACK  software  [21]  (used  by  MATLAB’s  svds  command),  PROPACK  [20], 
IRLBA  [1],  or  the  Jacobi-Davidson  algorithm  [17].  Randomized  SVD  algorithms  provide  an  appealing  alternative 
with  probabilistic  error  bounds  [16].  Here  we  describe  another  approach  that  satisfies  a  deterministic  error  bound 
(Lemma  5.1)  and  only  requires  one  pass  through  the  matrix  A,  a  key  property  for  massive  data  sets  that  cannot 
easily  be  stored  in  memory. 

This  approach  is  based  on  an  incremental  low  rank  A  ~  QR  approximation,  where  Q  e  Rnxk  has  orthonor¬ 
mal  columns  and  R  e  Rfcxm  is  upper  triangular.  (In  this  section  only,  Q  and  R  denote  different  quantities  from 
elsewhere  in  the  paper.)  Take  the  dense  (economy  sized)  SVD  R  =  VSWT,  and  put  V  =  QV  to  get 

A  «  QR  =  VSWT.  (5.1) 
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Input:  A,  an  m  x  n  matrix 

tol,  a  positive  scalar  controlling  the  accuracy  of  the  factorization 

Output:  Q,  an  m  x  k  matrix  with  orthonormal  columns 
R,  a  k  x  n  rectangular  matrix 
with  A  ~  QR 


Choose  k  <C  min (m,  n) 

Compute  the  QR  factorization  A(:,  1  :  k)  =  QR,  with  Q  €  Rnxk  and  R  €  Rkxm 
rownorms(i)  =  ||R(«,  :)||2  for  i  =  1, . . . ,  k 
j  =  k  +  1 

while  j  <  n 

a  =  A(:,j);  r  =  QTa;  f  =  a  -  Qr;  p=||f||;  q  =  i/p 

Q  =  [Q,  q];  R=  [  ^  * 


rownorms©  =  rownorms©  +  r(i)2  for  i  =  1, . . . ,  k 
rownormst  k  +  1)  =  p2 ; 

FnormR  =  sum(rownorms); 

[<7,  ^min]  =  min(rownorms(l  :  k  +  1)); 

if  a  >  (tol2)  *  (FnormR  —  rownorms(rmin) 

%  no  deflation 
k  =  k  +  1; 

else  %  deflation  required 

if  Zmin  k  I 

R(*mim  0  =  R (k  +  1,  :);  Q(: ,  imin)  =  Q(:)  k  +  1) 
rownorms  ('imm )  =  rownorms  ( k  +1) 

end 

%  delete  the  minimum  norm  row  o/R 
Q  =  Q(:,  1  :  k);  R  =  R(1  :  k,  :) 

end 

3=3  +  1 


Algorithm  2:  Incremental  QR  low  rank  approximate  factorization 


Incremental  algorithms  for  building  the  QR  factorization  and  SVD  have  been  proposed  by  Stewart  [23],  Baker, 
Gallivan,  and  Van  Dooren  [2]  and  many  others,  as  surveyed  in  [3];  these  ideas  are  also  closely  related  to  rank- 
revealing  QR  factorizations  [15],  Algorithm  2  differs  from  those  of  Stewart  in  its  use  of  internal  pivoting  and 
threshold  truncation  in  place  of  Stewart’s  column  pivoting.  This  distinction  enables  a  one-pass  algorithm  that  is 
closely  related  to  [2,  Algorithm  1]. 
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The  proposed  method  is  presented  in  Algorithm  2,  which  proceeds  at  each  step  by  orthogonalizing  a  column 
of  A  against  the  previously  orthogonalized  columns.  The  rank  of  the  resulting  factors  is  controlled  through  an 
update-and-delete  procedure  that  is  illustrated  in  Figure  2.  After  orthogonalizing  a  column  of  A,  the  algorithm 
checks  if  any  row  of  R  has  small  relative  norm;  if  such  a  row  exists,  the  corresponding  column  of  Q  makes  little 
contribution  to  the  factorization,  so  that  column  of  Q  and  row  of  R  can  be  deleted  at  only  a  small  loss  of  accuracy 
in  the  factorization.  (Future  columns  of  A  will  not  be  orthogonalized  against  the  vector  deleted  from  Q,  so  this 
direction  can  re-emerge  if  a  later  column  in  A  warrants  it.) 

Robust  implementations  of  Algorithm  2  should  replace  the  classical  Gram-Schmidt  operations 

r  =  Q7  a,  f  =  a  -  Qr 

with  a  re-orthogonalization  step,  as  suggested  by  Daniel,  Gragg,  Kaufman,  and  Stewart  [8] 

r  =  Qra 
f  =  a  -  Qr 
c  =  QTf 
f  =  f  -  Qc 
r  =  r  +  c 

P  =  llfll 

q  =  f/p. 

The  extra  steps  (5.2)-(5.4)  generally  provide  a  Q  that  is  numerically  orthogonal  to  working  precision.  Pathological 
cases  arc  easily  overcome  with  some  additional  slight  modifications;  see  [13]  for  a  complete  analysis.  Because 
this  algorithm  uses  the  classical  Gram-Schmidt  method,  one  can  easily  block  it  for  parallel  efficiency. 

5.1  Incremental  QR  Error  Bounds 

At  step  j  the  truncation  criterion  in  Algorithm  2  will  delete  row  r J  =  ej R?  if 

||rj||  <  tol  ||Rj||ir, 

where  r[  is  the  row  of  minimum  norm  and  R.y  denotes  Rj  with  the  ith  row  deleted.  This  strategy  has  a  straight¬ 
forward  error  analysis,  which,  in  light  of  the  approximation  (5.1),  also  implies  an  error  bound  on  the  resulting 
SVD. 

Lemma  5.1  Let  R.y  be  the  triangular  factor  at  step  j  of  Algorithm  2,  and  Q  ;  the  corresponding  orthonormal 
columns  in  the  approximate  QR  factorization  A  y  ~  Q  y  Ry,  where  A  y  consists  of  the  first  j  columns  of  A..  Then 

||  Aj  —  QjRj||p  <  tol  ■  dj  •  ||Rj||i?, 

where  dj  is  the  number  of  coloumn/row  deletions  that  have  been  made  up  to  and  including  step  j.  (Note  that 
Qj  €  R  £  uQ~di)xf  and  dn  =  min{m,  n}  —  k,  where  k  =  rank  (QnRn).) 


(5.2) 

(5.3) 

(5.4) 
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A( : ,  1  :  j)  = 


A(:,l:j  +  1)  = 


Partial  QR  factorization 


Extend  with  Gram-Schmidt 


4  V 


Find  qj  with  Replace  q*,  R(i, :) 

|R(v)||2<t0/2(||R|||-||R(v)||2) 


Truncate  last  column 
of  Q  and  row  of  R 


Figure  2:  Diagram  illustrating  the  QR  update  procedure. 


Before  proving  this  lemma,  we  note  that  it  gives  a  bound  on  the  error  in  the  resulting  approximate  SVD  of 
A.  Suppose  dn  deletions  are  made  when  this  algorithm  computes  the  approximate  factorization  A  ~  QR  with 
tolerance  tol.  Given  the  SVD  R  =  VSW*,  set  V  =  QV.  Then 

||  A  —  VSW*||f  <  tol  ■  dn  ■  ||R||f- 

Proof  of  Lemma  5.1:  The  proof  shall  be  by  induction.  Let  E;  =  A  j  —  Q;  R?  and  assume 

||Ej||j?  <  tol  ■  dj  ■  ||Rj||_F.  (5.5) 

Orthogonalize  column  j  +  1  of  A  using  Gram-Schmidt  to  obtain 


Aj+i  —  1  R'j + 1  +  F/  ■  0  ■ 


If  no  deflation  occurs  at  this  step,  the  bound  holds  trivially  since 

||Ej+i||i?  =  || [Ej ,  0] \\f  <  tol  ■  dj  ■  ||Rj ||p  <  tol  •  dj+i  •  ||Rj-)_i 1 1 jr , 
because  dj+i  =  dj  and  ||Rj||_f  <  || Rj_|_i || jr. 
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Suppose  Rj  has  dimension  k  x  j  (i.e.,  k  =  j  —  dj).  Let  i  be  the  index  of  the  row  of  minimum  norm  and  let 
Ry+i  be  obtained  by  deleting  the  ith  row  of  Rj+i  •  If  r  J  =  ef  Rj+i  satisfies  |r'7|  <  tol •  ||Rj+i \\r  then  deflation 
occurs.  Deleting  column  i  of  Qj+i  and  row  i  of  Rj+i  replaces  Qj+i  and  Rj+i  with  Qj+i  and  Rj+i-  Then 

Qj+lRj+1  =  Qj+l(&j+l  —  eiri  )) 

and 

Aj+i  =  Qj+i(Rj+i  —  eiri  )  +  [Ej,  0]  +  Qj+ieir?:  . 

Hence  the  deletion  gives  the  overall  error 

Ej+i  =  Aj+i  —  Qj+iRj+i  =  [Ej,  0]  +  Qj+ieiri  . 

Therefore,  when  i  <  k  +  1,  the  inductive  assumption  (5.5)  implies 

]|Ej+i||f  <  \\-Ej\\F  +  ||rf||  tol  *  (dj  *  1 1  Rj  1 1  p  1 1  Rj+i  1 1  f  ^  tol  ( dj  I  1 )  *  ||  Rj+i  ||^5 

since  Rj+i  contains  row  k  +  1  of  Rj+i,  which  must  have  a  norm  larger  than  the  row  marked  for  deletion.  Since 
row  k  +  1  of  Rj  i  consists  of  just  one  nonzero  element, 

IIrj+iIIf  ^  IIRjIIf  +  pl+i,j+i  ^  IIRjIIf, 

where  pk+i  j+t  is  the  element  Rj+i  (k  +  1,  j  +  1)  and  Rj  is  the  matrix  Rj  with  /th  row  deleted.  If  i  =  k  +  1,  then 
the  last  row  of  Rj+i  is  deleted  and  the  desired  inequality  must  hold,  since  Rj  is  a  submatrix  of  Rj+i-  At  the  end 
of  this  process,  replace  Rj+i  and  Qj+i  with  Rj+i  and  Qj+i  to  obtain  the  approximation 

II Aj_|_i ||  £  tol  ■  dj+\  •  ||Rj+i||F) 

since  dj+\  =  dj  +  1. 

The  error  bound  for  the  base  case  j  =  1  clearly  holds,  completing  the  induction.  H 

The  approximate  QR  factorization  that  results  from  this  algorithm  could  be  used  directly  for  the  approximation 
of  leverage  scores.  The  perturbation  theory  of  Ipsen  and  Wentworth  [18]  describes  how  the  tolerance  in  our 
algorithm  will  affect  the  accuracy  of  the  resulting  leverage  scores.  We  also  note  that  for  extra  expediency  this  one- 
pass  QR  algorithm  could  be  stopped  when  | R;  1 1  ^  ~  |  A||  p  (at  the  cost  of  an  extra  pass  through  A  to  compute 
I  A|| p),  or  applied  to  only  a  random  sampling  of  k  columns  of  A.  (Drmac  and  Gugercin  propose  a  different 
random  approach  to  DEIM  index  selection,  based  on  sampling  rows  of  V  to  compute  DEIM  indices  [11].) 

6  Computational  Examples 

This  section  presents  some  computational  evidence  illustrating  the  excellent  approximation  properties  of  the 
DEIM-CUR  factorization,  consistent  with  the  error  analysis  in  Section  4.  For  each  of  our  three  examples,  we 
compare  the  accuracy  of  the  DEIM-CUR  factorization  with  several  schemes  based  on  leverage  scores.  To  remove 
random  variations  from  our  experiments,  in  most  cases  we  select  columns  and  rows  having  the  highest  leverage 
scores;  for  the  first  example,  we  include  results  for  random  leverage  score  sampling.  For  Example  1  we  also  study 
the  effect  of  inaccurate  singular  vectors  on  the  DEIM  selection,  and  compare  the  accuracy  of  DEIM-CUR  to  CUR 
approximations  based  on  the  column-pivoted  QR  algorithm. 
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Example  1.  Low-rank  approximation  of  a  sparse,  nonnegative  matrix 
The  first  example  builds  a  matrix  A  e  g-soo.oooxsoo  t|lc  form 

10  „  300  1 

A  =  £  7  *jyJ  +  £  7  >  (6- 

where  x;/  G  gGOO.oon  an(j  y^.  g  ]g>300  are  Sparse  vectors  with  random  nonnegative  entries  (in  MATLAB,  Xj  = 
sprand(300000, 1,  0.025)  and  y j  =  sprand(300, 1,  0.025)).  In  this  instantiation,  A  has  15,971,584  nonzeros, 
i.e.,  about  18%  of  all  entries  are  nonzero.  The  form  (6.1)  is  not  a  singular  value  decomposition,  since  { x? }  and 
{y^}  arc  not  orthonormal  sets;  however,  this  decomposition  suggests  the  structure  of  the  SVD:  the  singular  values 
decay  like  1/j,  and  with  the  first  ten  singular  values  weighted  more  heavily  to  give  a  notable  drop  between  ctio 
and  a u.  We  begin  these  experiments  by  computing  V  and  W  using  MATLAB’s  economy-sized  SVD  routine 
([V,  S,W]  =  svd  (A,  '  O'  )). 

Figure  3  compares  the  error  ||  A  —  CUR||  for  DEIM-CUR  and  methods  that  take  C  and  R  as  the  columns 
and  rows  of  A  with  the  highest  leverage  scores.  These  scores  are  computed  using  either  all  right  and  left  singular 
vectors  (300  of  each),  or  using  only  the  leading  ten  right  and  left  singular  vectors.  Both  approaches  perform  rather 
worse  than  DEIM-CUR,  which  closely  tracks  the  optimal  value  (Ty.+\ . 

To  gain  insight  into  these  results,  we  examine  the  interpolation  constants  rjp  and  rjq  for  all  three  approaches. 
Figure  4  shows  that  these  constants  are  largest  for  leverage  scores  based  on  all  the  singular  vectors;  using  only 
ten  singular  vectors  improves  both  the  interpolation  constants  and  the  accuracy  of  the  approximation  (as  seen  in 
Figure  3).  The  DEIM-CUR  method  gives  better  interpolation  constants  and  more  accurate  approximations. 


LS  (all)  — • — LS  (10)  DEIM - DEIM(V.W)  — —vk+i 


Figure  3:  Accuracy  of  CUR  approximations  for  the  sparse,  nonnegative  matrix  (6.1)  using  k  columns  and  rows, 
constructed  by  DEIM-CUR  and  two  leverage  score  strategies:  “LS  (all)”  selects  rows  and  columns  with  highest 
leverage  scores  computed  using  all  300  singular  vectors;  “LS  (10)”  only  uses  the  leading  ten  singular  vectors.  The 
“DEIM(V,  W)”  curve  (nearly  atop  the  “DEIM”  curve)  uses  approximate  singular  vectors,  described  later. 
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—4— r]p,  LS  (all)  — ►—  rjq,  LS  (all)  r]p,  LS  (10)  — LS  (10)  |  |— A— DEIM  — r— fy/-  DEIM 


Figure  4:  Error  constants  rjp  =  ||(P^:  Vfc)  1 1  and  i)n  =  ||(W^  Q)  1 1  for  rows  and  columns  selected  using  two 
leverage  score  strategies  (left  plot)  and  the  DEIM  algorithm  (right  plot),  for  the  matrix  (6.1). 


a  LS  (10)  — • — LS  (sampled)—* — DEIM  — • — &k+i 


Figure  5:  Accuracy  of  CUR  approximations  for  (6.1)  generated  by  randomly  sampling  rows  and  columns  with 
probability  weighted  by  leverage  scores  computed  from  the  leading  ten  singular  vectors.  All  ten  trials  (gray  lines) 
perform  similarly  to  the  deterministic  “LS  (10)”  approach,  and  worse  than  the  DEIM-CUR  approximation. 


A  CUR  factorization  can  also  be  obtained  by  randomly  sampling  columns  and  rows  of  A,  with  the  probability 
of  selection  weighted  by  leverage  scores  [22].  We  apply  this  approach  on  the  current  example,  selecting  k  =  30 
rows  and  columns  of  A  with  a  probability  given  by  the  leverage  scores  computed  from  the  leading  ten  singular 
vectors  (normalized  to  give  a  probability  distribution).  Figure  5  gives  the  results  of  ten  independent  experiments, 
showing  that  while  sampling  can  sometimes  yield  better  results  than  the  deterministic  leverage  score  approach, 
overall  the  approximations  are  still  inferior  to  those  from  DEIM-CUR. 

How  robust  is  the  DEIM-CUR  approximation  to  errors  in  the  singular  vectors?  To  investigate,  we  compute 
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Figure  6:  The  angle  between  the  leading  A:-dinicnsional  exact  singular  subspaces  Ran(V/c)  and  Ran(W/j  (gen¬ 
erated  by  MATLAB’s  svd  command)  and  approximate  singular  subspaces  Ran(Vfc)  and  Ran(Wfc)  for  the  ma¬ 
trix  (6.1).  On  the  left,  Vj.  and  W/.  are  generated  using  the  Incremental  QR  algorithm  described  in  Section  5,  with 
tol  =  Hr  on  the  right,  V/,  and  W/,.  are  generated  using  randomized  SVD  algorithm  [16]  using  one  and  two 
applications  of  A  and  A7  . 

V  «  V  and  W  ~  W  using  the  Incremental  QR  algorithm  detailed  in  Section  5  (with  tol  =  10-4)  and  the 
Randomized  SVD  algorithm  described  by  Halko,  Martinsson,  and  Tropp  [16,  p.  227].  To  give  extreme  examples 
of  the  latter,  we  compute  V  and  W  through  only  one  or  two  applications  each  of  A  and  A7  .6  As  Figure  6 
illustrates,  in  both  cases  the  angle  between  the  exact  and  approximate  leading  singular  subspaces  is  significant, 
particularly  as  k  grows.  This  drift  in  the  subspaces  has  little  effect  on  the  accuracy  of  the  DEIM  approximations. 

•  The  DEIM  approximation  using  the  Incremental  QR  algorithm  is  quite  robust,  choosing  at  most  3  different 
row  indices  and  2  different  column  indices  for  k  =  1, . . . ,  30,  with  a  relative  discrepancy  in  ||  A  —  C^UfcRfc  || 
of  at  most  9.27%  (and  this  realized  only  at  step  k  =  30). 

•  When  A  and  AT  are  applied  once  in  the  Randomized  SVD  algorithm,  the  DEIM  indices  differ  considerably 
from  those  drawn  from  exact  singular  vectors  (e.g.,  for  k  =  30,  20  of  30  row  indices  and  3  of  30  column 
indices  differ),  yet  the  quality  of  the  approximation  ||A  —  C/.U/..R/,.  |  remains  almost  the  same  (relative 
difference  of  at  most  10.45%);  see  the  dashed  line  in  Figure  3. 

•  When  A  and  A7  are  applied  twice,  the  DEIM  indices  are  nearly  identical  (e.g.,  for  k  =  30,  0  of  30  row 
indices  and  2  of  30  column  indices  differ).  On  the  scale  of  the  plot  in  Figure  3,  ||  A  —  C/i:U/,.R,/,  |  could  not  be 
distinguished  from  the  DEIM-CUR  errors  using  exact  singular  vectors;  the  maximum  relative  discrepancy 
is  2.21%. 


6This  corresponds  to  q  =  0  and  q  =  1  in  the  notation  of  [16,  p.  227].  Let  the  columns  of  Q  £  R“x2fcm“  form  an  orthonormal 
basis  for  (AAT)9Afi,  where  fl  £  R"x2  max  is  a  random  matrix  with  i.i.d.  Gaussian  entries  and  we  take  fcmax  =  30.  Then  the  leading 
fcmax  columns  of  V  and  W  are  approximated  by  taking  the  SVD  of  Q*  A  £  Ra,n“  xn. 
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Thus  far  we  have  only  compared  the  DEIM-CUR  approximations  to  CUR  factorizations  obtained  from  leverage 
scores,  which  also  use  singular  vector  information,  thus  illustrating  how  DEIM  can  use  the  same  raw  materials  to 
better  effect.  Next  we  compare  DEIM-CUR  to  approximations  computed  using  a  different  approach  based  on 
QR  factorization  of  A;  see,  e.g.,  [7,  23].  Begin  by  computing  a  column-pivoted  QR  factorization  of  A;  the 
first  k  selected  columns  give  the  indices  q,  from  which  we  extract  C/c  =  A( : ,  q).  Next,  a  column-pivoted  QR 


ratio  (DEIM-CUR  error)/(QR-CUR  error) 


logi0(%>)  for  DEIM-CUR 


l°gio (VP)  for  QR-CUR 


Figure  7:  Comparison  of  DEIM-CUR  and  QR-CUR  performance  for  100  sparse  random  300,  000  x  300  matrices 
of  the  form  (6.1).  The  top  plot  shows  a  histogram  of  the  ratio  of  ||  A  —  C/,U/.R/,.  |  for  DEIM-CUR  and  QR-CUR. 
The  bottom  plots  compare  the  error  constant  rjp  =  ||  (PjT  V/,. ) ~ 1 1  for  DEIM-CUR  (left)  and  QR-CUR  (right);  note 
the  logarithmic  scale  of  the  horizontal  axes  in  the  lower  plots. 
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factorization  of  Cjr  is  performed;  the  first  k  selected  columns  of  C{:  give  the  indices  p,  from  which  we  build 
R/  =  A(p,  : ).  We  refer  to  this  technique  as  “QR-CUR.” 

Figure  7  compares  the  results  for  100  trials  involving  sparse  random  matrices  of  dimension  300,  000  x  300 
having  the  form  of  our  first  experiment  (6.1).  DEIM-CUR  and  QR-CUR  produce  factorizations  with  similar 
accuracy,  which  we  illustrate  with  a  histogram  of  the  ratio  of  ||A  —  C/,U/..R,/,  |  for  DEIM-CUR  to  QR-CUR,  for 
k  =  1, . . . ,  100.  (DEIM-CUR  produces  a  smaller  error  when  the  ratio  is  less  than  one.)  While  these  errors  are 
similar,  the  error  constants  rjp  and  rjq  for  the  two  methods  are  quite  different.  The  bottom  plots  in  Figure  7  compare 
histograms  of  log10  rjp.  For  DEIM-CUR,  the  rjp  values  are  quite  consistent  across  the  100  random  A,  while  for 
QR-CUR  the  rjp  values  are  both  larger  and  rather  less  consistent.  (The  figures  for  r]q  are  qualitatively  identical,  but 
about  an  order  of  magnitude  smaller  for  both  methods.) 

The  advantage  of  DEIM-CUR  over  approximations  based  on  leverage  scores  remains  when  the  singular  values 
decrease  more  sharply.  Modify  (6.1)  to  give  a  more  significant  drop  between  trio  and  a\  \ : 


A  = 


^  1000  T  ^  1 

—  ^  , 

i=t  J  j= n  J 


3  '  n 


(6-2) 


As  seen  in  Figures  8  and  9,  the  DEIM-CUR  approach  again  delivers  excellent  approximations,  while  selecting  the 
rows  and  columns  with  highest  leverage  scores  does  not  perform  nearly  as  well.  (In  Figure  9,  note  the  significant 
jump  in  the  “LS  (10)”  error  constant  r]q  corresponding  to  those  k  values  where  ||  A  —  CfcU/jRfcH/crfc+i  is  large.) 


-LS  (all)  — a— LS  (10) 


-DEIM 


-Pk  + 1 


Figure  8:  Accuracy  of  CUR  approximations  using  k  rows  and  columns,  for  DEIM-CUR  and  two  leverage  score 
strategies  for  the  sparse,  nonnegative  matrix  (6.2).  “LS  (all)”  selects  rows  and  columns  having  the  highest  leverage 
scores  computed  using  all  300  singular  vectors;  “LS  (10)”  uses  the  leading  10  singular  vectors. 
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Figure  9:  Error  constants  rjp  =  ||  (P'j.  V/,.)  1 1|  and  rjq  =  ||  (W^Q/,.)  1 1|  for  rows  and  columns  selected  using  two 
leverage  score  strategies  (left  plot)  and  the  DEIM  algorithm  (right  plot),  for  the  sparse  matrix  A  given  in  (6.2). 


Example  2.  TechTC  term  document  data 

The  second  example,  adapted  from  Mahoney  and  Drineas  [22],  computes  the  CUR  factorization  of  a  term  docu¬ 
ment  matrix  with  data  drawn  from  the  Technion  Repository  of  Text  Categorization  Datasets  (TechTC)  [12].  The 
rows  of  the  data  matrix  correspond  to  websites  (consolidated  from  multiple  webpages),  while  the  columns  corre¬ 
spond  to  “features”  (words  from  the  text  of  the  webpages).  The  (j,  k)  entry  of  A  reflects  the  importance  of  the 
feature  text  on  the  given  website;  most  entries  are  zero.  For  this  experiment  we  use  TechTC- 100  test  set  26,  which 
concatenates  a  data  set  relating  to  Evansville,  Indiana  (id  10567)  with  another  for  Miami,  Florida  (id  11346). 
Following  Mahoney  and  Drineas  [22],  we  omit  all  features  with  four  or  fewer  characters  from  the  data  set,  leav¬ 
ing  a  matrix  with  139  rows  and  15,170  columns.  Each  row  of  A  is  then  scaled  to  have  unit  2-norm.  Ideally  a 
CUR  factorization  not  only  gives  an  accurate  low-rank  approximation  to  A,  but  also  selects  rows  corresponding  to 
representative  webpages  from  each  geographic  area,  and  columns  corresponding  to  meaningful  features. 

Figure  10  compares  DEIM-CUR  approximations  to  row  and  column  selection  based  on  highest  leverage  scores 
(from  all  singular  vectors,  or  the  two  leading  singular  vectors).  The  DEIM-CUR  approximations  are  typically 
more  accurate  than  those  based  on  leverage  scores,  but  all  approaches  give  errors  roughly  two  times  larger  than  the 
slowly-decaying  optimal  value  of  (Jk+i-  How  do  the  DEIM  columns  (features)  compare  to  those  with  the  highest 
leverage  scores?  Figure  1 1  shows  the  leverage  scores  associated  with  each  column  of  A  (based  on  the  two  leading 
singular  vectors),  along  with  the  first  30  columns  selected  by  DEIM.  While  the  columns  with  highest  leverage 
scores  were  found  by  DEIM,  there  are  DEIM  columns  with  marginal  leverage  scores,  and  vice  versa.  This  data 
is  more  easily  parsed  in  Table  1,  which  lists  the  features  corresponding  to  the  first  20  DEIM  columns.  (To  ease 
comparison,  we  normalize  leverage  scores  so  that  the  maximum  value  is  one.)  The  leading  features  identified  by 
DEIM,  including  “evansville”  (first  DEIM  point),  “florida”  (second),  “miami”  (sixth),  and  “indiana”  (nineteenth), 
indeed  reveal  key  geographic  terms.  These  terms  scored  at  least  as  high  when  ranked  by  leverage  scores  based  on 
two  leading  singular  vectors;  when  all  singular  vectors  are  used,  the  scores  of  these  terms  generally  drop,  relative 
to  other  features.  Overall,  one  notes  that  DEIM  selects  a  significantly  different  set  of  indices  than  those  valued  by 
leverage  scores,  and,  as  seen  in  Figure  10,  tends  to  provide  a  somewhat  better  low-rank  approximation. 


24 


-4— LS  (all)  — a— LS  (2)  — DEIM  — —  °k+i 


Figure  10:  Accuracy  of  CUR  factorizations  for  the  TechTC  example,  selecting  rows  and  columns  using  top  lever¬ 
age  scores  for  all  singular  vectors  and  the  leading  two  singular  vectors,  and  DEIM. 


DEIM  column  indices  •  leverage  scoresl 


Figure  1 1 :  The  columns  selected  by  DEIM  for  the  TechTC  example,  compared  to  leverage  scores  from  the  leading 
two  singular  vectors. 

Example  3.  Tumor  detection  in  genetics  data 

Our  final  example  uses  the  GSE10072  cancer  genetics  data  set  from  the  National  Institutes  of  Health,  previously 
investigated  by  Kundu,  Nambirijan,  and  Drineas  [19].  The  matrix  A  e  K22'283x  m‘  contains  data  for  22,283  probes 
applied  to  107  patients.  The  (j,  k)  entry  of  A  reflects  how  strongly  patient  k  responded  to  probe  j.  This  experiment 
seeks  probes  that  segment  the  population  into  two  clusters:  the  58  patients  with  tumors,  and  the  49  without.7  To 

7The  data  is  available  from  http  :  //www .  ncbi  .  nlm.  nih .  gov/ geo  /query/  acc .  cgi?acc=GSE10072. 
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Table  1:  The  features  selected  by  DEIM-CUR  for  the  TechTC  data  set,  compared  to  the  (scaled)  leverage  scores 
using  the  leading  two  singular  vectors,  and  all  singular  vectors. 


DEIM 

rank 

index 

<h 

LS  (2) 

rank  score 

LS  (all) 
rank  score 

feature 

1 

10973 

1 

1.000 

4 

0.875 

evansville 

2 

1 

2 

0.741 

8 

0.726 

florida 

3 

1547 

13 

0.031 

2 

0.948 

spacer 

4 

109 

8 

0.055 

66 

0.347 

contact 

5 

209 

12 

0.040 

32 

0.458 

service 

6 

50 

4 

0.116 

6 

0.739 

miami 

7 

824 

46 

0.007 

5 

0.809 

chapter 

8 

1841 

33 

0.010 

20 

0.537 

health 

9 

171 

5 

0.113 

13 

0.617 

information 

10 

234 

16 

0.026 

37 

0.436 

events 

11 

595 

84 

0.004 

15 

0.576 

church 

12 

60 

15 

0.026 

67 

0.347 

email 

13 

945 

10 

0.047 

30 

0.474 

services 

14 

1670 

129 

0.002 

1 

1.000 

bullet 

15 

216 

35 

0.009 

38 

0.430 

music 

16 

78 

3 

0.246 

24 

0.492 

south 

17 

213 

19 

0.018 

110 

0.259 

their 

18 

138 

14 

0.030 

43 

0.408 

please 

19 

6110 

7 

0.060 

95 

0.280 

indiana 

20 

1152 

70 

0.005 

152 

0.221 

member 

center  the  data,  we  subtract  the  mean  of  each  row  from  all  entries  in  that  row.  As  shown  in  [19],  the  leading  two 
principal  vectors  of  this  matrix  segment  the  population  very  well. 

Like  the  TechTC  data,  the  singular  values  of  A  decay  slowly,  as  seen  in  Figure  12.  Once  again  the  DEIM-CUR 
procedure  produces  a  more  accurate  low-rank  approximation  than  obtained  by  selecting  the  rows  and  columns  with 
highest  leverage  scores,  whether  those  are  computed  using  all  the  singular  vectors,  or  just  the  leading  two  or  ten. 

Table  2  reports  the  first  15  rows  selected  by  the  DEIM-CUR  process,  along  with  the  corresponding  leverage 
scores  based  on  two,  ten,  and  all  singular  vectors.  Do  the  probes  selected  by  DEIM  discriminate  the  patients  with 
tumors  (“sick”)  from  those  without  (“well”)?  To  investigate,  for  each  selected  probe  we  count  the  number  of  large 
positive  entries  corresponding  to  sick  and  well  patients.8  Some  but  not  all  of  the  DEIM-CUR  probes  effectively 
select  only  sick  or  well  patients.  Contrast  these  results  with  Table  3,  which  shows  the  probes  with  highest  leverage 
scores  (drawn  from  the  leading  two  singular  vectors).  Only  four  of  these  probes  were  also  selected  by  the  DEIM 
procedure  (even  if  we  continue  the  DEIM  procedure  to  select  the  maximum  number,  n  =  107,  of  indices).  This 
discrepancy  is  quite  different  from  the  good  agreement  between  DEIM  and  leverage  score  indices  for  the  TechTC 
data  in  Table  1,  despite  the  similar  dimensions  and  the  comparably  slow  decay  of  the  singular  values. 

While  the  rows  selected  from  leverage  scores  did  not  produce  as  accurate  an  approximation,  ||  A  —  C/.U/R/,.  ||, 

sIn  particular,  we  call  an  entry  of  the  mean-centered  matrix  A  large  if  its  value  exceeds  one.  Of  the  22,283  probes,  for  only  23  probes 
do  at  least  30  of  the  58  sick  patients  have  such  large  entries;  for  only  95  probes  do  at  least  30  of  the  49  well  patients  have  large  entries. 
There  is  no  overlap  between  the  probes  that  are  strongly  expressed  by  the  sick  and  well  patients. 
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4— LS  (all)  LS  (10)  — a— LS  (2)  — DEIM  — — Cfc  +  i 


Figure  12:  Accuracy  of  CUR  factorizations  for  a  genetics  data  set.  DEIM-CUR  consistently  outperforms  factor¬ 
izations  derived  by  taking  the  rows  and  columns  with  largest  leverage  scores,  regardless  of  whether  these  scores 
are  drawn  from  all  singular  vectors,  the  leading  ten  singular  vectors,  or  the  leading  two  singular  vectors. 


Table  2:  Genetics  example:  the  probes  selected  by  DEIM-CUR,  compared  to  the  (scaled)  leverage  scores  using 
the  leading  two  singular  vectors,  ten  singular  vectors,  and  all  singular  vectors. 


DEIM 

rank 

index 

Qj 

probe 

set 

gene 

name 

number 

sick 

number 

well 

LS  (2) 

rank  score 

LS  (10) 
rank  score 

LS  (all) 
rank  score 

1 

9565 

2 1008  Eat 

AGER 

2 

45 

1 

1.000 

45 

0.504 

386 

0.123 

2 

14270 

214895_s_at 

ADAM  10 

8 

3 

211 

0.173 

1171 

0.108 

3344 

0.036 

3 

8650 

209156_s_at 

COL6A2 

5 

6 

15156 

0.005 

252 

0.245 

708 

0.091 

4 

11057 

211653_x_at 

AKR1C2 

18 

1 

6440 

0.017 

11 

0.656 

146 

0.185 

5 

14153 

214777_at 

IGKV4-1 

27 

3 

281 

0.148 

19 

0.607 

106 

0.209 

6 

18976 

219612_s_at 

FGG 

17 

17 

2591 

0.039 

2 

0.956 

4 

0.825 

7 

3831 

204304 _s_at 

PROM1 

16 

4 

992 

0.073 

70 

0.417 

32 

0.345 

8 

3351 

203824_at 

TSPAN8 

17 

4 

9687 

0.011 

21 

0.582 

31 

0.355 

9 

4275 

204748_at 

PTGS2 

18 

14 

424 

0.118 

13 

0.624 

42 

0.313 

10 

1437 

201909 _at 

RPS4Y1 

21 

34 

8232 

0.013 

3 

0.913 

5 

0.736 

11 

14150 

214774_x_at 

TOX3 

34 

0 

95 

0.262 

49 

0.492 

102 

0.210 

12 

10518 

211074_at 

FOLR1 

7 

4 

9482 

0.011 

926 

0.124 

213 

0.159 

13 

9580 

210096_at 

CYP4B1 

6 

44 

8 

0.797 

65 

0.431 

54 

0.284 

14 

4002 

204475_at 

MMP1 

27 

0 

34 

0.406 

24 

0.564 

21 

0.465 

15 

13990 

214612_x_at 

MAGEA 

16 

0 

489 

0.110 

134 

0.323 

35 

0.339 

as  DEIM,  these  probes  do  a  much  more  effective  job  of  discriminating  patients  with  tumors  from  those  without. 
Indeed,  for  14  of  the  top  15  probes,  the  tumor-free  patients  express  strongly,  while  the  patients  with  tumors  do  not; 
in  the  remaining  case,  the  opposite  occurs. 
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Table  3:  Genetics  example:  the  probes  with  top  (scaled)  leverage  scores,  derived  from  the  first  two  singular  vectors. 


LS  (2) 
rank 

index 

Qj 

LS  (2) 

score 

probe 

set 

gene 

name 

number 

sick 

number 

well 

DEIM 

rank 

1 

9565 

1.000 

210081_at 

AGER 

2 

45 

1 

2 

13766 

0.922 

214387_x_at 

SFTPC 

6 

48 

— 

3 

11135 

0.907 

211735_x_at 

SFTPC 

5 

48 

73 

4 

9361 

0.899 

209875  _s_at 

SPP1 

50 

2 

— 

5 

5509 

0.896 

205982_x_at 

SFTPC 

5 

48 

— 

6 

9103 

0.835 

209613_s_at 

ADH1B 

2 

47 

— 

7 

14827 

0.834 

215454_x_at 

SFTPC 

0 

46 

— 

8 

9580 

0.797 

210096_at 

CYP4B1 

6 

44 

13 

9 

4239 

0.754 

20471 2 _at 

WIFI 

5 

43 

70 

10 

3507 

0.724 

203980_at 

FABP4 

2 

44 

— 

11 

18594 

0.717 

219230_at 

TMEM100 

2 

38 

— 

12 

9102 

0.684 

209612_s_at 

ADH1B 

2 

46 

— 

13 

13514 

0.626 

214135_at 

CFDN18 

3 

47 

— 

14 

5393 

0.626 

205866_at 

FCN3 

0 

39 

— 

15 

4727 

0.614 

205200 _at 

CFEC3B 

0 

39 

— 

7  Conclusions 

The  Discrete  Empirical  Interpolation  Method  (DEIM)  is  an  index  selection  procedure  that  gives  simple,  deter¬ 
ministic  CUR  factorizations  of  the  matrix  A.  Since  DEIM  utilizes  (approximate)  singular  vectors,  we  propose 
an  effective  one-pass  incremental  approximate  QR  factorization  that  can  efficiently  compute  dominant  singular 
vectors  for  data  sets  with  rapidly  decaying  singular  values;  this  method  could  prove  useful  in  a  variety  of  other 
settings.  The  accuracy  of  the  resulting  rank -A-  CUR  factorization  can  be  bounded  in  terms  of  07.  + 1 ,  the  error  in  the 
best  rank-A:  approximation  to  A.  Our  analysis  of  the  2-norm  error  ||  A  —  CUR|j  applies  to  all  CUR  approxima¬ 
tions  that  use  the  optimal  central  factor  U  =  C]ARJ,  and  hence  can  give  insight  into  the  performance  of  other 
index  selection  algorithms,  such  as  leverage  scores,  uniform  random  sampling,  or  column-pivoted  QR  factoriza¬ 
tion.  Numerical  examples  illustrate  that  the  DEIM-CUR  approach  can  deliver  very  good  low  rank  approximations, 
compared  to  row  selection  based  on  dominant  leverage  scores. 
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